Information discovery across multiple streams

نویسندگان

  • Vagelis Hristidis
  • Oscar Valdivia
  • Michail Vlachos
  • Philip S. Yu
چکیده

In this paper we address the issue of continuous keyword queries on multiple textual streams and explore techniques for extracting useful information from them. The paper represents, to our best knowledge, the first approach that performs keyword search on a multiplicity of textual streams. The scenario that we consider is quite intuitive; let's assume that a research or financial analyst is searching for information on a topic, continuously polling data from multiple (and possibly heterogeneous) text streams, such as RSS feeds, blogs, etc. The topic of interest can be described with the aid of several keywords. Current filtering approaches would just identify single text streams containing some of the keywords. However, it would be more flexible and powerful to search across multiple streams, which may collectively answer the analyst's question. We present such model that takes in consideration the continuous flow of text in streams and uses efficient pipelined algorithms such that results are output as soon as they are available. The proposed model is evaluated analytically and experimentally, where the ENRON dataset and a variety of blog datasets are used for our experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Pattern Discovery in Multiple Streams

Given m groups of streams which consist of n1, . . . , nm coevolving streams in each group, we want to: (i) incrementally find local patterns within a single group, (ii) efficiently obtain global patterns across groups, and more importantly, (iii) efficiently do that in real time while limiting shared information across groups. In this paper, we present a distributed, hierarchical algorithm add...

متن کامل

Introducing New Priority Setting and Resource Allocation Processes in a Canadian Healthcare Organization: A Case Study Analysis Informed by Multiple Streams Theory

Background In this article, we analyze one case instance of how proposals for change to the priority setting and resource allocation (PSRA) processes at a Canadian healthcare institution reached the decision agenda of the organization’s senior leadership. We adopt key concepts from an established policy studies framework – Kingdon’s multiple streams theory – to inform our analysis.   Methods Tw...

متن کامل

Scalable Maintenance of Knowledge Discovery in an Ontology Stream

In dynamic settings where data is exposed by streams, knowledge discovery aims at learning associations of data across streams. In the semantic Web, streams expose their meaning through evolutive versions of ontologies. Such settings pose challenges of scalability for discovering (a posteriori) knowledge. In our work, the semantics, identifying knowledge similarity and rarity in streams, togeth...

متن کامل

An Advanced Analytical Environment for Scientific Discovery within Continuous, Time-Varying Data-Streams

This paper discusses our recent work across a number of disciplines, leading to a concept for a next generation analytical environment for scientific discovery within continuous, time-varying data-streams. First, we have created a stream-processing engine that processes multiple streams of interest. An analyst, via a client interface, reviews the data-stream format and specifies upstream filter...

متن کامل

Mining Frequent Co-occurrence Patterns across Multiple Data Streams

This paper studies the problem of mining frequent co-occurrence patterns across multiple data streams, which has not been addressed by existing works. Co-occurrence pattern in this context refers to the case that the same group of objects appear consecutively in multiple streams over a short time span, signaling tight correlations between these objects. The need for mining such patterns in real...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 179  شماره 

صفحات  -

تاریخ انتشار 2009